GRAB: generalized region assigned to binary
نویسندگان
چکیده
Scale is one of the major challenges in recognition problems. For example, a face captured across large distances is considerably harder to recognize than the same face at small distances. Local binary pattern (LBP) and its variants have been successfully used in face detection, recognition, and many other computer vision applications. While LBP features are shown to be discriminative in face recognition, the pixel level description of LBP features is sensitive to the change in scale of the images. In this work, we extend the utility of a generalized variant of LBP feature descriptor called generalized region assigned to binary (GRAB), previously introduced in an article below, and show that it handles the challenges due to scale. The original LBP operator in another article is defined with respect to the surrounding pixel values while the GRAB operator is defined with respect to overlapping surrounding regions. This gives more general description and flexibility in choosing the right operator depending on the varying imaging conditions such as scale variations. We also propose a way to automatically select the scale of the GRAB operator (size of neighborhood). A pyramid of multi-scale GRAB operators is constructed, and the operator at each scale is applied to an image. Selection of operator’s scale is performed based on the number of stable pixels at different levels of the multi-scale pyramid. The stable pixels are defined to be the pixels in the images for which the GRAB value remains the same even as the GRAB operator scale changes. In addition to the experiments in the former article, we apply basic LBP, Liao et al.’s multi-scale block (MB)-LBP, and GRAB operator on face recognition across multiple scales and demonstrate that GRAB significantly outperforms the basic LBP and is more stable compared to MB-LBP in cases of reduced scale on a subsets of a well-known published database of labeled faces in the wild (LFW). We also perform experiments on the standard LFW database using strict LFW protocol and show the improved performance of GRAB descriptor compared to LBP and Gabor descriptors. Introduction One of the theoretical challenges in recognition is the extraction of features, which are sufficiently discriminative in addition to being invariant to the variables like illumination, translation, rotation, scale, etc. This work presents a feature descriptor primarily to handle the challenges due to scale in addition to the challenges due to illumination and noise and applies the descriptor for face recognition at low-scale images. Scale is critical in unconstrained face recognition since, in general, subjects may be at different distances from the camera, and the difference between a subject at 4 ft and one at 40 ft is a 10-time change in scale. In this work, we present a new description based on the original local binary pattern (LBP), which combines *Correspondence: [email protected]; [email protected] 1Department of Computer Science, University of Colorado at Colorado Springs, Colorado, USA 2Securics Inc, Colorado Springs, Colorado, USA micro-structure and global structure, as well as the structure at multiple scales of the face images. We call this operator general region assigned to binary (GRAB) and use this operator to extract features for facial recognition in images of varied scales. The prior extensions to produce the ‘multi-resolution’ [1] LBP simply used a larger neighborhood ‘circle’ but sampled the raw pixels on that circle. While it did consider pixels at greater distances, sampling does not mimic changes in resolution or scale. Our neighborhood operator overcomes this limitation by defining the pixels in terms of varied sizes of overlapping regions. What is the impact of scale on face recognition?We conducted a small experimental analysis to see the impact of scale on face images. To reduce the number of variables contributing to recognition score differences, we took a subset of images from labeled faces in the wild (LFW) database, normalized them to the size of 150× 130, downscaled the images to multiple scales, and upscaled back to the same size. The gallery and the probe consisted of the same images from the same subjects. The only © 2013 Sapkota and Boult; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Sapkota and Boult EURASIP Journal on Image and Video Processing 2013, 2013:35 Page 2 of 11 http://jivp.eurasipjournals.com/content/2013/1/35 variable is the scale. Gallery consisted of images of size 150 × 130 while the probe consisted of images of size 15 × 13 and 30 × 26. We took basic LBP features [2,3], multi-block (MB)-LBP features [4], and GRAB features and used a support vector machine (SVM) classifier for classification. The gallery and probe consisted of 1,830 images from 610 subjects from the LFW dataset. We will discuss more about this subset of LFW database in the ‘Experiments and results’ section. We observed that GRAB features were more discriminative than the basic LBP features and MB-LBP features on low-scale images. At the scale of 30 × 26, 8 images were misclassified out of 1,830 images using LBP while with proper selection of GRAB scale, all 1,830 images were correctly classified. At the scale of 15 × 13, 252 images were misclassified out of 1,830 probe images while GRAB achieved 100% accuracy. At the scale of 150 × 130, LBP and GRAB both achieve 100% accuracy. Figure 1 shows the examples of misclassified samples using LBP while being correctly classified using GRAB. We also observed that the GRAB features are more stable across multiple scales compared to MB-LBP features. In [4], the authors did not use scale-selection algorithm and used boosting algorithm after extracting multiple MB-LBP features. Therefore, we compare our GRAB features with MB-LBP features across multiple scales. The results on Table 1 shows that GRAB is more stable than MB-LBP with change is scales. This analysis was done on a very small data with only scale as a variable. The impact of scale on the accuracy on bigger data with more variations can be huge. This shows that choice of feature descriptor is critical on low-scale images. Following are the main contributions of our work: 1. Definition of GRAB as a generalized operator for feature description; 2. Method for selection of operator’s scale space; 3. Demonstration of higher accuracy of GRAB descriptor compared to existing methods on low-scale images. Related work A lot of work has been done in the past in describing meaningful and distinctive features in images that can be used for recognition. Local binary pattern (LBP) is an operator, which was originally used to extract a texture description from imagery and is widely used in face recognition. The operator assigns a label to every pixel of an image by thresholding the 3 × 3 neighborhood of each pixel with the center pixel value, resulting to a binary number [2,3]. The pixel level features thus obtained are combined in the form of histograms in various ways to generate the global features for the face description. LBP has been one of the best-performing descriptors as it contains the microstructure as well as the macrostructure of the face image. Despite its popularity, there are a number of shortcomings in the LBP approach, including sensitivity to noise, scale changes, and rotation of the image. One of the extensions of LBP to produce the multiresolution LBP [1] uses a larger neighborhood circle but still samples the raw pixels on that circle. While it does consider pixels at greater distances, sampling does not Figure 1 Impact of scale on face images.Misclassified faces using standard LBP and correctly classified using GRAB: Gallery and probe consisted of the same set of images with only difference in scale. Probe images are low-scale images which are resized to higher scale to match the size of gallery images as we are using the histogram-based method. Top row consists of probe images of size 30× 26, and bottom row consists of probe images of size 15× 13. Both gallery and probe images are resized to 13× 150 for matching. Images on the left side are probe images, and the images on the right are gallery images. The images in the red box are the misclassified images using standard LBP, and the images in the green box are the correctly classified images using GRAB. Sapkota and Boult EURASIP Journal on Image and Video Processing 2013, 2013:35 Page 3 of 11 http://jivp.eurasipjournals.com/content/2013/1/35 Table 1 Classification accuracy of LBP, MB-LBP, and GRAB on images 150× 130 30× 26 15× 13 Features G1, P1 G1, P1 G3, P1 * G5, P1 G3, P3 G1, P1 G3, P3 G5, P3 * G7, P7 GRAB 1 0.9956 1 1 1 0.8622 0.9685 0.9978 1 LBP 1 0.9956 0.8622 %Gain 0 0 0.44 0.44 0.44 0 12.32 15.72 15.98 MB-LBP 1 0.9956 0.9972 0.9945 1 0.8622 0.8950 0.9464 0.9994 %Gain 0 0 0.28 0.55 0 0 8.21 5.43 0.06 From a subset of LFW database with multiple scales. The gallery and probe images are the same; the only difference is the scale. All gallery images are of the size 130 × 150 whereas probe images are of the sizes 130× 150, 30× 26, and 15× 13. The columns of the table show the multiple scales of the operators. For example, (G5, P1) means the scale of the operator is 5 for gallery and 1 for probe, which means gallery images are smoothed by window size of 5 to match the unknown smoothing present in the probe. The columns marked with asterisk are the operator scales automatically selected according to our scale-selection algorithm described later in this paper. Since there is no such selection mechanism in MB-LBP except boosting algorithm, we compared the algorithms on multiple scales. Since LBP does not allow the averaging operator, we mark those fields with hyphens. According to the results above, GRAB is more stable across scales compared to LBP and MB-LBP. properly model changes in resolution or scale, which results in pixels being combined and not sampled. Consider what happens on a region with a fine binary texture, where sampling chooses one of the two binary colors but changes in scale actually mix the values into new shades/colors. In [5], this multi-resolution LBP is combined with novel color representations which combine RGB, YCbCr, and YIQ color spaces. The results did improve performance on the FRGC data, but that did not actually contain multiple resolutions so sampling artifacts in color space would impact those experiments. Studies have introduced the concept of a MB-LBP to provide a more robust operator than LBP [4]. In MBLBP, the average sum of image intensity is computed in each subregion around a center subregion. These average sums are then compared with the center block. They note that, ‘MB-LBP can be viewed as a certain way of combination using eight ordinal rectangle features’. WhileMB-LBP does improve recognition by representing a mixture of microstructure and macrostructure of the image pattern, they did not study the impact of scale but rather focused on improving recognition at a fixed scale. Themore recently proposed BRIEF descriptors [6,7] use binary strings as the feature descriptors instead of using decimal value of binary strings as used in basic LBP and its other variants. The binary strings are defined on the smoothed patches. Binary tests between a pair of pixels are performed for the classification. Similar to our work, they highlight the importance of smoothing before extracting LBP-like features. However, they choose a fixed 9 × 9 window for the experiments. For face recognition, the limited pairs of sample points or test points, with a fixed smoothing window may not be sufficient. Our GRAB features provide sufficient information for face recognition across multiple scales. LBP features have also been used in the past for face detection. The work in [8] used LBP features as a facial representation and built a face detection system using SVM as a classifier. Another example of the variant of LBP used for face detection is [9]. It uses multi-block local binary pattern features and the boosting algorithm for face detection [9]. Due to the peculiarities of the face shape and variability of several aspects of the face, the face recognition problem is different from the other object recognition problems. Some of the previous work used the combination of local as well as global representation of the face descriptors to solve this problem. Multi-resolution histograms of local variation pattern [10] is one such method which describes face images as the concatenation of the local spatial histogram of local variation patterns computed from multi-resolution Gabor features. Gabor features are another interesting set of features which are highly applied in face recognition [11,12]. The Gabor representation of face images incorporates multiscale feature extraction. The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor wavelets at different scales; for example, Pinto et al. present a V1-like algorithm that considers 96 different Gabor filters. Local features are represented by the coefficient set, or Gabor jet, which orders the convolution results at different orientation and scales for a particular point. Feature transform (SIFT) is a popular method in object recognition [13,14]. They extract the features of an image using the key points that are invariant to scale change. To detect such key points, they search the stable features across all possible scales using a scale space and such key points are associated with location, scale, and orientation information. To define the local image features, they sample the local image intensities around the key points at the appropriate scale of the key point. Bicego et al. used SIFT features for authentication in [15], wherein they used the distance between all pairs of keypoint descriptors in the two images to define the matching score. For face authentication, this type of algorithm was not as successful as it Sapkota and Boult EURASIP Journal on Image and Video Processing 2013, 2013:35 Page 4 of 11 http://jivp.eurasipjournals.com/content/2013/1/35 was in other object recognition problems using SIFT-like features. Unfortunately, the planarity assumption underlying the theory of SIFT features and the highly non-planar and self-occluding nature of faces result in weak performance on face recognition tasks. In [16], SIFT features are combined in a mixed local-global strategy supporting a recognition-from-parts approach to address occlusion. In this work, we present an operator called GRAB, developed as a generalization of LBP. While we will show the effectiveness of GRAB, like other multi-resolution approaches, there is likelihood that it will suffer the curse of dimensionality. There are techniques for reducing dimensionality. For example, Chan et al. [17] uses subspace techniques of LDA to help reduce the dimensionality of standard MLBP while maintaining or increasing the accuracy of the added dimensionality. In terms of added accuracy, they argue that, ‘However, by directly applying the similarity measurement to the multi-scale LBP histogram, the performance will be compromised. The reason is that this histogram is of high dimensionality and contains redundant information’. While Chan et al. show impressive results, in this work, we use GRAB and scale-selection algorithm rather than MBLP to avoid sampling issues and will use SVMs for recognition, which remove the redundancy in a different, and generally more effective way. And again, our focus is on addressing recognition under scale changes, not just improving recognition rates. GRAB GRAB is developed as a basic operator for neighborhood modeling of a pixel. For the simple GRAB operator, with neighbors j = 1, . . . , n, we let c stand for the center pixel and j for the neighbor pixel. For each pixel c, we can define the generalized binary representation as:
منابع مشابه
Smart grab bars: a potential initiative to encourage bath grab bar use in community dwelling older adults.
Grab bars are often prescribed to ensure safe and independent bathing and toileting. Studies have shown that seniors do not always use grab bars when they are present or are reluctant to install them due to the associated stigma. This study sought to determine if artificial intelligence could increase grab bar use by seniors and to determine the efficacy of different cues (auditory, visual, and...
متن کاملOptimal DC Fast Charging Placing And Sizing In Iran Capital (Tehran)
DC fast charging (DCFC) and optimal placing of them is a fundamental factor for the popularization of electric vehicles (EVs). This paper proposes an approach to optimize place and size of charging stations based on genetic algorithm (GA). Target of this method is minimizing cost of conversion of gas stations to charging stations. Another considered issue is minimizing EVs losses to find neares...
متن کاملGeneralized Huffman Coding for Binary Trees with Choosable Edge Lengths
In this paper we study binary trees with choosable edge lengths, in particular rooted binary trees with the property that the two edges leading from every non-leaf to its two children are assigned integral lengths l1 and l2 with l1 + l2 = k for a constant k ∈ N. The depth of a leaf is the total length of the edges of the unique root-leaf-path. We present a generalization of the Huffman Coding t...
متن کاملAlgebraic Soft-Decision Decoding of Reed-Solomon Codes Using Bit-level Soft Information
The performance of algebraic soft-decision decoding (ASD) of Reed-Solomon (RS) codes using bit-level soft information is investigated. Optimal multiplicity assignment strategies (MAS) of ASD with infinite cost are first studied over erasure channels and binary symmetric channels (BSC). The corresponding decoding radii are calculated in closed forms and tight bounds on the error probability are ...
متن کاملFuzzy number-valued fuzzy relation
It is well known fact that binary relations are generalized mathematical functions. Contrary to functions from domain to range, binary relations may assign to each element of domain two or more elements of range. Some basic operations on functions such as the inverse and composition are applicable to binary relations as well. Depending on the domain or range or both are fuzzy value fuzzy set, i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Image and Video Processing
دوره 2013 شماره
صفحات -
تاریخ انتشار 2013